BlendDB: Blending Table Layouts to Support Efficient Browsing of Relational Databases

نویسندگان

  • Adam Marcus
  • Samuel R. Madden
  • David R. Karger
  • Terry P. Orlando
چکیده

The physical implementation of most relational databases follows their logical description, where each relation is stored in its own file or collection of files on disk. Such an implementation is good for queries that filter or aggregate large portions of a single table, and provides reasonable performance for queries that join many records from one table to another. It is much less ideal, however, for join queries that follow paths from a small number of tuples in one table to small collections of tuples in other tables to accumulate facts about a related collection of objects (e.g., co-authors of a particular author in a publications database), since answering such queries involves one or more random I/Os per table involved in the path. If the primary workload of a database consists of many such path queries, as is likely to be the case when supporting browsing-oriented applications, performance will be quite poor. This thesis focuses on optimizing the performance of these kinds of path queries in a system called BlendDB, a relational database that supports on-disk co-location of tuples from different relations. To make BlendDB efficient, the thesis will propose a clustering algorithm that, given knowledge of the database workload, co-locates the tuples of multiple relations if they join along common paths. To support the claim of improved performance, the thesis will include experiments in which BlendDB provides better performance than traditional relational databases on queries against the IMDB movie dataset. Additionally, this thesis will show that BlendDB provides commensurate performance to materialized views while using less disk space, and can achieve better performance than materialized views in exchange for more disk space when users navigate between related items in the database. Thesis Supervisor: Samuel R. Madden Title: Associate Professor of Electrical Engineering and Computer Science Thesis Supervisor: David R. Karger Title: Professor of Electrical Engineering and Computer Science

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Approach for Providing multidimensional Data on Relational DBMSs

Multidimensional data are used in popular to support OLAP(On-Line Analytical Processing) operations efficiently. Many OLAP tools have been developed to manipulate the multidimensional data to provide users with useful information for decision making. While they are equipped with useful features, they seem be sometimes complicated to relational database users. In this paper, we developed an easy...

متن کامل

Layout Optimization for Distributed Relational Databases Using Machine Learning

.................................................................................................................................. 2 Acknowledgements ................................................................................................................. 4 LIST OF FIGURES ....................................................................................................

متن کامل

An Efficient Data Extraction and Storage Utility For XML Documents

In this paper, a mechanism to provide selective extraction of data objects from XML documents, the storage of these documents in an object-relational database, and retrieval and reconstruction of XML documents from extracted data objects is discussed. The motivation is provided by a need for a Workflow Process Repository in a Workflow Management System (WFMS) [6], namely METEOR WFMS, to store m...

متن کامل

An Introduction to the TriggerMan Asynchronous Trigger Processor

A new type of system for testing trigger conditions and running trigger actions outside of a DBMS is proposed in this paper. Such a system is called an asynchronous trigger processor since it processes triggers asynchronously, after triggering updates have committed in the source database. The architecture of a prototype asynchronous trigger processor called TriggerMan is described. TriggerMan ...

متن کامل

Fuzzy Multi-Join and Top-K Query Model for Search-As-You-Type in Multiple Tables

A search-as-you-type system determines answers on-the-fly as a user types in a keyword query, character by character. There arises a higher need to know the support search-as-you-type on data residing in a relational DBMS. The existing work on keyword query focuses on to support type of search using the native database SQL. The leverage existing database functionalities is to meet high performa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008